25 research outputs found

    A New Deterministic Data Aggregation Method For Wireless Sensor Networks

    No full text
    The processing capabilities of wireless sensor nodes enable to aggregate redundant data to limit total data flow over the network. The main property of a good aggregation algorithm is to extract the most representative data by using minimum resources. From this point of view, sampling is a promising aggregation method, that acts as surrogate for the whole data, and once extracted can be used to answer multiple kinds of queries (such as AVG, MEDIAN, SUM, COUNT, etc.), at no extra cost to the sensor network. Additionally, sampling also preserves correlations between attributes of multi-dimensional data, which is quite valuable for further data mining. In this paper, we propose a novel, distributed, weighted sampling algorithm to sample sensor network data and compare to an existing random sampling algorithm, which is the only algorithm to work in this kind of setting. We perform popular queries to evaluate our algorithm on a real world data set, which covers climate data in the U.S. for the past 100 years. During testing, we focus on issues such as sample quality, network longevity, energy and communication costs

    Deterministic Data Reduction in Sensor Networks Third

    No full text
    A wide range of mining and analysis problems involve extracting knowledge from count data. Such data typically arises from transactional data sets; here we consider the case where it arises from a highly distributed source such as a sensor network. A general approach that scales well with the data is sampling, and we have proposed several deterministic streaming algorithms for efficiently reducing such data. Those algorithms perform with significantly better accuracy than random sampling for problems such as frequency estimation, correlation detection, association rules and iceberg cube computations. In this paper, we consider the distributed version of the problem, and specifically the case when the data originates from a sensor network. We engineer a fully-distributed version of our algorithm which builds a deterministic sample along some tree-like aggregation structure. We demonstrate that this distributed sample has about the same quality as would have been computed by running our (non-distributed) deterministic algorithm on the underlying data. We compare to other (non-deterministic) sampling algorithms while focusing on issues such as sample quality, network longevity, and energy and communication costs. Index Terms Distributed algorithm, sampling, data reduction, data mining, count dataset, frequency estimation, associatio
    corecore